18 research outputs found

    Save up to 99% of your time in mapping validation

    Get PDF
    Identifying semantic correspondences between different vocabularies has been recognized as a fundamental step towards achieving interoperability. Several manual and automatic techniques have been recently proposed. Fully manual approaches are very precise, but extremely costly. Conversely, automatic approaches tend to fail when domain specific background knowledge is needed. Consequently, they typically require a manual validation step. Yet, when the number of computed correspondences is very large, the validation phase can be very expensive. In order to reduce the problems above, we propose to compute the minimal set of correspondences, that we call the minimal mapping, which are sufficient to compute all the other ones. We show that by concentrating on such correspondences we can save up to 99% of the manual checks required for validation

    DISI -Via Sommarive 14 -38123 Povo

    Get PDF
    Abstract Handling everyday tasks such as search, classification and integration is becoming increasingly difficult and sometimes even impossible due to the increasing streams of data available. To overcome such an information overload we need more accurate information processing tools capable of handling big amounts of data. In particular, handling metadata can give us leverage over the data and enable structured processing of data, however, while some of this metadata is in a computer readable format, some of it is manually created in ambiguous natural language. Thus, accessing the semantics of natural language can increase the quality of information processing. We propose a natural language metadata understanding architecture that enables applications such as semantic matching, classification and search based on natural language metadata by providing a translation into a formal language which outperforms the state of the art by 15%

    Large Scale Semantic Matching: Agrovoc vs CABI

    Get PDF
    Achieving semantic interoperability is a difficult problem with a lot of challenges yet to address. Some of them include matching large-scale data sets, tackling the problem of missing background knowledge, evaluating large scale results, tuning the matching process and doing all of the above in a realistic setting with resource and time constraints. In this paper we report the results of a large-scale matching experiment performed on domain-specific resources: two agricultural thesauri. We share the experience concerning the above mentioned aspects of semantic matching, discuss the results, draw conclusions and outline perspective directions of future work

    Computing minimal and redundant mappings between lightweight ontologies

    Get PDF
    The minimal mapping between two lightweight ontologies contains that minimal subset of mapping elements such that all the others can be efficiently computed from them. They have clear advantages in visualization and user interaction since they are the minimal amount of information that needs to be dealt with. They make the work of the user much easier, faster and less error prone. Experiments on our proposed algorithm to compute them demonstrate a substantial improvement both in run-time and number of elements found

    Descriptive Phrases: Understanding Natural Language Metadata

    Get PDF
    Fast development of information and communication technologies made available vast amounts of heterogeneous information. With these amounts growing faster and faster, information integration and search technologies are becoming a key for the success of information society. To handle such amounts efficiently, data needs to be leveraged and analysed at deep levels. Metadata is a traditional way of getting leverage over the data. Deeper levels of analysis include language analysis, starting from purely string-based (keyword) approaches, continuing with syntactic-based approaches and now semantics is about to be included in the processing loop. Metadata gives a leverage over the data. Often a natural language, being the easiest way of expression, is used in metadata. We call such metadata "natural language metadata". The examples include various titles, captions and labels, such as web directory labels, picture titles, classification labels, business directory category names. These short pieces of text usually describe (sets of ) objects. We call them "descriptive phrases". This thesis deals with a problem of understanding natural language metadata for its further use in semantics aware applications. This thesis contributes by portraying descriptive phrases, using the results of analysis of several collected and annotated datasets of natural language metadata. It provides an architecture for the natural language metadata understanding, complete with the algorithms and the implementation. This thesis contains the evaluation of the proposed architecture

    Lightweight Parsing of Classifications

    Get PDF
    Understanding metadata written in natural language is a crucial requirement towards the successful automated integration of large scale, language-rich, classifications such as the ones used in digital libraries. In this article we analyze natural language labels used in such classifications by exploring their syntactic structure, and then we show how this structure can be used to detect patterns of language that can be processed by a lightweight parser whose average accuracy is 96.82%. This allows for a deep understanding of natural language metadata semantics. In particular we show how we improve the accuracy of the automatic translation of classifications into lightweight ontologies by almost 18% with respect to the previously used approach. The automatic translation is required by applications such as semantic matching, search and classification algorithms

    Summarization of Concepts for Visual Disambiguation

    Get PDF
    Controlled vocabularies that power semantic applications allow them to operate with high precision, which comes with a price of having to disambiguate between senses of terms. Fully automatic disambiguation is a largely unsolved problem and semi-automatic approaches are preferred. These approaches involve users to do the disambiguation and require an adequate user interface. However, term definitions are usually lengthy and not only occupy valuable screen space, but reading and understanding these definitions requires the user’s attention and time. As an alternative to using definitions we propose to use a summary — a “single word” disambiguation label for a concept. In this paper we present an algorithm to summarize concepts from a controlled vocabulary. We evaluate the algorithm with 51 users and show that the algorithm generates summaries that have good discriminative and associative qualities. In addition, the length of summaries are comparable to the length of the original terms, thus making the algorithm particularly useful in situations where screen estate is limited

    Computing minimal mappings

    Get PDF
    Given two classifications, or lightweight ontologies, we compute the minimal mapping, namely the subset of all possible correspondences, called mapping elements, between them such that i) all the others can be computed from them in time linear in the size of the input ontologies, and ii) none of them can be dropped without losing property i). In this paper we provide a formal definition of minimal mappings and define a time efficient computation algorithm which minimizes the number of comparisons between the nodes of the two input ontologies. The experimental results show a substantial improvement both in the computation time and in the number of mapping elements which need to be handled

    S-Match: an open source framework for matching lightweight ontologies

    Get PDF
    Achieving automatic interoperability among systems with diverse data structures and languages expressing different viewpoints is a goal that has been difficult to accomplish. This paper describes S-Match, an open source semantic matching framework that tackles the semantic interoperability problem by transforming several data structures such as business catalogs, web directories, conceptual models and web services descriptions into lightweight ontologies and establishing semantic correspondences between them. The framework is the first open source semantic matching project that includes three different algorithms tailored for specific domains and provides an extensible API for developing new algorithms, including possibility to plug-in specific background knowledge according to the characteristics of each application domain. Published on Semantic Web, IOS Press, 2011 at: http://iospress.metapress.com/content/57661w1u1n712531/?p=7fd96e0bb13646f090c157f2c8e89c37&pi=

    Lightweight Parsing of Classications into Lightweight Ontologies

    Get PDF
    Understanding metadata written in natural language is a premise to successful automated integration of large scale, language-rich, classications such as the ones used in digital libraries. We analyze the natural language labels within classication by exploring their syntactic structure, we then show how this structure can be used to detect patterns of language that can be processed by a lightweight parser with an average accuracy of 96.82%. This allows for a deeper understanding of natural language metadata semantics, which we show can improve by almost 18% the accuracy of the automatic translation of classications into lightweight ontologies required by semantic matching, search and classication algorithms
    corecore